Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Bai Song

GUI-Rise: Structured Reasoning and History Summarization for GUI Navigation

Oct 31, 2025

Tao Liu, Chongyu Wang, Rongjie Li, Yingchen Yu, Xuming He, Bai Song

Figure 1 for GUI-Rise: Structured Reasoning and History Summarization for GUI Navigation

Figure 2 for GUI-Rise: Structured Reasoning and History Summarization for GUI Navigation

Figure 3 for GUI-Rise: Structured Reasoning and History Summarization for GUI Navigation

Figure 4 for GUI-Rise: Structured Reasoning and History Summarization for GUI Navigation

Abstract:While Multimodal Large Language Models (MLLMs) have advanced GUI navigation agents, current approaches face limitations in cross-domain generalization and effective history utilization. We present a reasoning-enhanced framework that systematically integrates structured reasoning, action prediction, and history summarization. The structured reasoning component generates coherent Chain-of-Thought analyses combining progress estimation and decision reasoning, which inform both immediate action predictions and compact history summaries for future steps. Based on this framework, we train a GUI agent, \textbf{GUI-Rise}, through supervised fine-tuning on pseudo-labeled trajectories and reinforcement learning with Group Relative Policy Optimization (GRPO). This framework employs specialized rewards, including a history-aware objective, directly linking summary quality to subsequent action performance. Comprehensive evaluations on standard benchmarks demonstrate state-of-the-art results under identical training data conditions, with particularly strong performance in out-of-domain scenarios. These findings validate our framework's ability to maintain robust reasoning and generalization across diverse GUI navigation tasks. Code is available at https://leon022.github.io/GUI-Rise.

* Published in NeurIPS 2025

Via

Access Paper or Ask Questions

VMRF: View Matching Neural Radiance Fields

Jul 06, 2022

Jiahui Zhang, Fangneng Zhan, Rongliang Wu, Yingchen Yu, Wenqing Zhang, Bai Song, Xiaoqin Zhang, Shijian Lu

Figure 1 for VMRF: View Matching Neural Radiance Fields

Figure 2 for VMRF: View Matching Neural Radiance Fields

Figure 3 for VMRF: View Matching Neural Radiance Fields

Figure 4 for VMRF: View Matching Neural Radiance Fields

Abstract:Neural Radiance Fields (NeRF) have demonstrated very impressive performance in novel view synthesis via implicitly modelling 3D representations from multi-view 2D images. However, most existing studies train NeRF models with either reasonable camera pose initialization or manually-crafted camera pose distributions which are often unavailable or hard to acquire in various real-world data. We design VMRF, an innovative view matching NeRF that enables effective NeRF training without requiring prior knowledge in camera poses or camera pose distributions. VMRF introduces a view matching scheme, which exploits unbalanced optimal transport to produce a feature transport plan for mapping a rendered image with randomly initialized camera pose to the corresponding real image. With the feature transport plan as the guidance, a novel pose calibration technique is designed which rectifies the initially randomized camera poses by predicting relative pose transformations between the pair of rendered and real images. Extensive experiments over a number of synthetic and real datasets show that the proposed VMRF outperforms the state-of-the-art qualitatively and quantitatively by large margins.

Via

Access Paper or Ask Questions